Most cross-device federated learning (FL) studies focus on the model-homogeneous setting where the global server model and local client models are identical. However, such constraint not only excludes low-end clients who would otherwise make unique contributions to model training but also restrains clients from training large models due to on-device resource bottlenecks. In this work, we propose FedRolex, a partial training (PT)-based approach that enables model-heterogeneous FL and can train a global server model larger than the largest client model. At its core, FedRolex employs a rolling sub-model extraction scheme that allows different parts of the global server model to be evenly trained, which mitigates the client drift induced by the inconsistency between individual client models and server model architectures. We show that FedRolex outperforms state-of-the-art PT-based model-heterogeneous FL methods (e.g. Federated Dropout) and reduces the gap between model-heterogeneous and model-homogeneous FL, especially under the large-model large-dataset regime. In addition, we provide theoretical statistical analysis on its advantage over Federated Dropout and evaluate FedRolex on an emulated real-world device distribution to show that FedRolex can enhance the inclusiveness of FL and boost the performance of low-end devices that would otherwise not benefit from FL. Our code is available at https://github.com/MSU-MLSys-Lab/FedRolex.
translated by 谷歌翻译
使用X光片级注释(是或否疾病)和细粒病变级注释(病变边界框)开发了两个DL模型,分别为Chexnet和ChexDet。在测试集(n = 2,922)中比较了模型的内部分类性能和病变定位性能,在NIH-Google(n = 4,376)和Padchest(n = 24,536)数据集上比较了外部分类性能,以及外部病变的本地化性能性能在NIH-Chestx-Ray14数据集(n = 880)上进行了比较。还将模型与内部测试集子集的放射学家进行了比较(n = 496)。鉴于足够的训练数据,这两个模型都与放射科医生相当。 CHEXDET对外部分类有了显着改善,例如在NIH-Google上分类(ROC曲线下的ChexDet区域[AUC]:0.67:Chexnet AUC:0.51; P <.001)和PadChest(ChexDet AUC:0.78,Chexnet AUC,Chexnet AUC,Chexnet AUC,Chexnet auc:chexnet auc auc:chexnet auc auc auc:0.78,chexnet auc auc: :0.55; p <.001)。对于所有数据集的大多数异常,例如在内部集合中检测气胸(Chexdet Jacknife替代自由响应ROC的功绩[JAFROC-FOM]:0.87,0.87,CHEXNET JAFROC-FOM:0.113) ; p <.001)和NIH-Chestx-Ray14(Chexdet Jafroc-fom:0.55,Chexnet Jafroc-fom:0.04; p <.001)。总结,细粒的注释克服了快捷方式学习并启用了DL模型,以识别正确的病变模式,从而改善模型的概括性。
translated by 谷歌翻译
医疗编码是一项复杂的任务,需要将超过72,000个ICD代码的子集分配给患者的笔记。对这些任务的现代自然语言处理方法已受到输出空间的输入和大小的长度挑战。我们将模型输入限制在文档中发现的医疗实体周围的一个小窗口中。从这些本地上下文中,我们构建了ICD代码和实体的上下文化表示,并汇总这些表示形式以形成文档级预测。与现有的方法相反,该方法使用使用大小或训练中的代码固定的表示形式,我们通过用本地上下文编码代码描述来表示ICD代码。我们讨论适合在实践中部署编码系统的指标。我们表明,我们的方法优于标准和可部署措施的现有方法,包括在稀有和看不见的代码上的性能。
translated by 谷歌翻译
大多数现有对象检测工作都是基于边界框注释:每个对象都有一个精确的注释框。然而,对于肋骨骨折,边界盒注释非常有劳动力密集型且耗时,因为放射科医生需要以切片为基础调查和注释肋骨骨折。尽管一些研究提出了弱监督的方法或半监督方法,但他们不能同时处理不同形式的监督。在本文中,我们提出了一个新颖的Omni监督对象检测网络,该网络可以利用多种不同形式的注释数据以进一步改善检测性能。具体而言,所提出的网络包含一个监督的检测头,其中每种形式的注释数据对应于唯一的分类分支。此外,我们为不同的注释数据形式提出了动态标签分配策略,以促进每个分支的更好学习。此外,我们还设计了一种自信的分类损失,以高度信心强调样本并进一步改善模型的性能。在测试数据集上进行的广泛实验表明,我们所提出的方法始终超过其他最先进的方法,这证明了深度全米诺的学习对改善肋骨断裂检测性能的功效。
translated by 谷歌翻译
经常报告深度学习模型以从数据集偏见等快捷方式中学习。由于深度学习在现代医疗保健系统中起着越来越重要的作用,因此在医疗数据中与快捷方式学习以及发展公正和可信赖的模型非常需要。在本文中,我们研究了从有偏见的训练数据中开发出偏见的胸部X射线诊断模型的问题,而又不知道偏置标签。我们从观察到偏见分布的不平衡是引起快捷键学习的关键原因之一,并且模型比预期的功能更容易学习,而数据集偏见则由模型偏爱。基于这些观察结果,我们提出了一种新型算法,即伪平衡的学习,该学习首先通过广义跨熵损失捕获并预测每样本偏差标签,然后使用伪偏置标签和偏见平衡的软性软性功能来训练一个模型。我们使用各种数据集偏置情况构建了几个胸部X射线数据集,并通过广泛的实验证明了我们所提出的方法对其他最新方法进行了一致的改进。
translated by 谷歌翻译
多任务学习在NLP中是有用的,因为实际上是希望在一系列任务中有一个型号的单个模型。在医疗领域,对任务的顺序培训可能有时是培训模型的唯一方法,因为因为对原始(潜在敏感)数据的访问不再可用,或者只是由于联合再培训所固有的计算成本。然而,顺序学习固有的一个主要问题是灾难性的遗忘,即,当为新任务更新模型时,对先前任务的准确性大幅下降。弹性重量整合是最近提出的解决这个问题的方法,但是将这种方法扩展到实践中使用的现代大型模型需要对模型参数进行强烈的独立假设,限制其有效性。在这项工作中,我们应用了Kronecker分解 - 最近的方法可以放松独立假设 - 以防止在规模的卷积和变压器的神经网络中灾难忘记。我们展示了该技术对在三个数据集中的医疗实体链接的重要和说明性任务中的有效性,证明了在新的医疗数据可用时,用于对现有方法进行有效更新的技术的能力。平均而言,当使用基于BERT的模型时,所提出的方法将灾难性忘记减少51%,相比使用标准弹性重量固结的27%减少,同时保持与模型参数数量成比例的空间复杂性。
translated by 谷歌翻译
This work addresses the problem of estimating the full body 3D human pose and shape from a single color image. This is a task where iterative optimization-based solutions have typically prevailed, while Convolutional Networks (ConvNets) have suffered because of the lack of training data and their low resolution 3D predictions. Our work aims to bridge this gap and proposes an efficient and effective direct prediction method based on ConvNets. Central part to our approach is the incorporation of a parametric statistical body shape model (SMPL) within our end-to-end framework. This allows us to get very detailed 3D mesh results, while requiring estimation only of a small number of parameters, making it friendly for direct network prediction. Interestingly, we demonstrate that these parameters can be predicted reliably only from 2D keypoints and masks. These are typical outputs of generic 2D human analysis ConvNets, allowing us to relax the massive requirement that images with 3D shape ground truth are available for training. Simultaneously, by maintaining differentiability, at training time we generate the 3D mesh from the estimated parameters and optimize explicitly for the surface using a 3D per-vertex loss. Finally, a differentiable renderer is employed to project the 3D mesh to the image, which enables further refinement of the network, by optimizing for the consistency of the projection with 2D annotations (i.e., 2D keypoints or masks). The proposed approach outperforms previous baselines on this task and offers an attractive solution for direct prediction of 3D shape from a single color image.
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译